Framework for real-time clustering over sliding windows

نویسندگان

  • Sobhan Badiozamany
  • Kjell Orsborn
  • Tore Risch
چکیده

Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

DENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window

Evolving data streams are ubiquitous. Various clustering algorithms have been developed to extract useful knowledge from evolving data streams in real time. Density-based clustering method has the ability to handle outliers and discover arbitrary shape clusters whereas grid-based clustering has high speed processing time. Sliding window is a widely used model for data stream mining due to its e...

متن کامل

Semi-Markov Clustering

Clustering of subsequences of time series is a widely studied and applied class of techniques [3, 4, 2, 6, 1, 5, 7] which aims at interpreting a time series as a shorter sequence of symbols, each of which represent a segment of the series with a pattern that is similar to other segments that has been assigned the same symbol. The research field of subsequence clustering, which was already a wid...

متن کامل

Hcluwin: an Algorithm for Clustering Heterogeneous Data Streams over Sliding Windows

Many applications in web usage mining, such as business intelligence and usage characterization, require effective and efficient techniques to discover the users with similar usage patterns and the web pages with correlate contents in the physical world. Clustering click streams can help to achieve the goal. Despite the high processing rate, the existing methods for clustering click streams ove...

متن کامل

Counting distinct objects over sliding windows

Aggregation against distinct objects has been involved in many real applications with the presence of duplicates, including real-time monitoring moving objects. In this paper, we investigate the problem of counting distinct objects over sliding windows with arbitrary lengths. We present novel, time and space efficient, one scan algorithms to continuously maintain a sketch so that the counting c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016